Learning Objective (5 minutes)

  1. What is classification?
  2. What types of classification are exist?
  3. Two types of unsupervised classification
  4. Supervised classification and evaluastion of the results
## rgeos version: 0.4-3, (SVN revision 595)
##  GEOS runtime version: 3.6.1-CAPI-1.10.1 
##  Linking to sp version: 1.3-1 
##  Polygon checking: TRUE
## rgdal: version: 1.4-4, (SVN revision 833)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj
##  Linking to sp version: 1.3-1
## Loading required package: lattice
## Loading required package: latticeExtra
## Loading required package: RColorBrewer
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.

Classification in remote sensing is the process of arranging pixels into group of classes. Classification can be unsupervise (clustering) or supervise.

Unsupervise classification (clustering) (10 min)

Unsupervise classification means what we don’t know "“truth” about an image but we can group pixels into classes according to their spectral properties. There are many unsupervised methods which have their advantages for different types of data. In the figure below you can see example of application of some of these methods to synthetic datasets (https://scikit-learn.org/stable/modules/clustering.html).

imagename We can see that different methods work differently depending on patterns. The interesting case in the last row which shows homogenious distribution. Only three of nine methods group all pixels into one class.

We can apply some of these methods to our landsat 8 Manchester scene. imagename We see that KMeans and GaussianMixture give noisy but resonable results (compare to RGB) and DBSCAN producing most noisy results. Method Birch looks less noisy and gives distinct spatial patterns. However, the last method takes longest time to run. This method isn’t available in R and we will not use it.

K-Means classification

K-means is one of most popuar unsupervised classification methods (Tou and Gonzalez, 1974; Fränti and Sieranoja, 2019). We can see to the method through the cost function which looks like \(J=\sum\limits_{i=1}^n \min\limits_{j \in \{i..k\}}||x_i-c_j||^2\) This means that we adjust n pixels into k classes by minimization of the cost function J. \(c_j\) where \(j=1..k\) are centroids. Distance to centroids determines classes. Centroids firstly initialized randomly and then adjasting according to cost function J.

Let’s give names to the Landsat bands

# hist(tif_stack)
band_names <- c('ultra-blue', 'blue', 'green', 'red', 'NIR', 'SWIR1', 'SWIR2')

Open the Landsat image and crop it to the area of Manchester.

# Create list of bands file paths
lst_names <- list.files("Manchester_RS_Notebook_and_Data/", full.names = TRUE, pattern = "*[1-7].TIF$")
lst_names <- lst_names[c(1,3,4,5,6,7,8)]
b_stack0 <- stack(lst_names)
lst_names
## [1] "Manchester_RS_Notebook_and_Data//LC08_L1TP_203023_20190513_20190521_01_T1_B1.TIF"
## [2] "Manchester_RS_Notebook_and_Data//LC08_L1TP_203023_20190513_20190521_01_T1_B2.TIF"
## [3] "Manchester_RS_Notebook_and_Data//LC08_L1TP_203023_20190513_20190521_01_T1_B3.TIF"
## [4] "Manchester_RS_Notebook_and_Data//LC08_L1TP_203023_20190513_20190521_01_T1_B4.TIF"
## [5] "Manchester_RS_Notebook_and_Data//LC08_L1TP_203023_20190513_20190521_01_T1_B5.TIF"
## [6] "Manchester_RS_Notebook_and_Data//LC08_L1TP_203023_20190513_20190521_01_T1_B6.TIF"
## [7] "Manchester_RS_Notebook_and_Data//LC08_L1TP_203023_20190513_20190521_01_T1_B7.TIF"
b_stack0
## class      : RasterStack 
## dimensions : 7981, 7891, 62978071, 7  (nrow, ncol, ncell, nlayers)
## resolution : 30, 30  (x, y)
## extent     : 451185, 687915, 5763885, 6003315  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
## names      : LC08_L1TP//1_01_T1_B1, LC08_L1TP//1_01_T1_B2, LC08_L1TP//1_01_T1_B3, LC08_L1TP//1_01_T1_B4, LC08_L1TP//1_01_T1_B5, LC08_L1TP//1_01_T1_B6, LC08_L1TP//1_01_T1_B7 
## min values :                     0,                     0,                     0,                     0,                     0,                     0,                     0 
## max values :                 65535,                 65535,                 65535,                 65535,                 65535,                 65535,                 65535
# Crop imgae using row and column numbers:
band_stack <- crop(b_stack0, extent(b_stack0, 2314, 2814, 3047, 3547))

names(band_stack) <- band_names

#plot all bands
plot(band_stack, legend = FALSE, main=band_names, col=grey(1:100/100))

Let’s see RGB image made by different combination of spectral bands. This will help us to interprete results of classification.

plotRGB(band_stack, 4,3,2, stretch = 'lin', colNA='white', main='True Color')

plotRGB(band_stack, 5,4,3, stretch = 'lin', colNA='white', main='False Color')

plotRGB(band_stack, 5,6,7, stretch = 'lin', colNA='white', main='Bands 5,6 and 7')

We can see that vegetation is easier to recognize on the false color composite when Near Infrared (NIR) band is placed into red chanel. Healthy vegetation absorbes radiation in the Red range and strongly reflects in NIR. RGB combination of infra red bands shows vegetation as orange becouse Short Wave Infrared (SWIR) bands related to leaf water content. The difference with false color is that we can regonize not just vegetation but also different types of vegetation according to its water content.

Convert stack of 2D matrices to an array of vectors becouse classification code doesn’t work with 2D arrays.

b_vec <- getValues(band_stack)

Look at the dimentionality of that vector.

str(b_vec)
##  int [1:251001, 1:7] 10199 10160 10092 10096 10153 10164 10144 10088 10028 10018 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:7] "ultra.blue" "blue" "green" "red" ...

Set seed

set.seed(99)

Do classification!

# Here we ommiting not a number values by na.omiy, set up number of centroids toten, maximum number of iteration to 500.
kmncluster <- kmeans(na.omit(b_vec), centers = 10, iter.max = 500, nstart = 5, algorithm="Lloyd")

Look at the return value (str)

str(kmncluster)
## List of 9
##  $ cluster     : int [1:251001] 2 2 2 2 10 4 6 3 3 3 ...
##  $ centers     : num [1:10, 1:7] 12892 10361 10396 10506 11903 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:10] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:7] "ultra.blue" "blue" "green" "red" ...
##  $ totss       : num 5.27e+12
##  $ withinss    : num [1:10] 7.74e+10 1.22e+11 1.08e+11 6.42e+10 7.90e+10 ...
##  $ tot.withinss: num 8.47e+11
##  $ betweenss   : num 4.42e+12
##  $ size        : int [1:10] 9285 22897 3207 31026 20436 42675 58413 25058 2391 35613
##  $ iter        : int 260
##  $ ifault      : NULL
##  - attr(*, "class")= chr "kmeans"
#knr <- setValues(band_stack[[1]], kmncluster$cluster)

Convert returned vector to 2D raster. Operator $ takes record “cluster” from returned list (look at previous chunk).

knr <- setValues(band_stack[[1]], kmncluster$cluster)

Set up colors for classes

mycolor <- c("#fef65b","#ff0000", "#daa520","#0000ff","#0000ff","#00ff00","#cbbeb5", "#c3ff5b", "#ff7373", "#00ff00", "#808080")

CLARA classification

The CLARA (Clustering LARge Application) algorithm is based on the Partition Around Medoids (PAM) algorithm which is an implementation of K-medoids algorithm (Kaufman and Rousseeuw, 2005).

The task of the K-medoids algorithm is to minimize the cost function \(J=min\sum\limits_{i=1}^n\sum\limits_{j=1}^nd(i,j)z_{i,j}\)

For the CLARA classifiction we use the same vector ‘b_vec’ as for the k-means classification

cccluster <- clara(na.omit(b_vec), k=10)
clara_img = setValues(band_stack[[1]], cccluster$cluster)

Plot the results

# Plot the results of both clustering methods
plot(knr, main = 'K-means classification', col = mycolor )

plot(clara_img, main = 'Clara classification', col = mycolor)

We ca see that CLARA method has separated different types of vegetation meanwhile K-Means grouped them into one cluster. Also CLARA better seoarated urban area. At the same time K-Means separated water to different cluster

Supervise classification (10 min)

Supervised clussification is one of the main tool in working with Remote Sensing data. There are a lot of different methods with their own advantages and disadvantages. In this exercise we will use Classification and Regression Trees (CART) (Lawrence, 2001) and Random Forest (Ho, 1998) methods.

Classification and Regression Trees (CART)

Land Cover Map 2015 (LCM2015) will be used as reference data. LCM2015 is made using satellite data and cartographical information and provides land cover information for the UK.Spatial resolution 25m. LCM2015 has following classes:

0 Unclassified, 1 Broadleaved Woodland, 2 Coniferous Woodland, 3 Arable and Horticulture, 4 Improved Grassland, 5 Neutral Grassland, 6 Calcareous Grassland, 7 Acid grassland, 8 Fen, Marsh and Swamp, 9 Heather, 10 Heather grassland, 11 Bog, 12 Inland Rock, 13 Saltwater, 14 Freshwater, 15 Supra-littoral Rock, 16 Supra-littoral Sediment, 17 Littoral Rock, 18 Littoral sediment, 19 Saltmarsh, 20 Urban, 21 Suburban.

Setup class names and colors

# Get LCM data
lcm <- brick('lcm2015_landsat_stack.tif') 

# Crop LCM data using row and column numbers:
lcm <- crop(lcm, extent(lcm, 2314, 2814, 3047, 3547))


names(lcm) <- c("lcm2015", "prob")

# The class names
class_names <- c("Broadleaved Woodland", "Coniferous Woodland", "Arable and Horticulture", "Improved Grassland", "Neutral Grassland", "Calcareous Grassland", "Acid grassland", "Fen, Marsh and Swamp", "Heather", "Heather grassland", "Bog", "Inland Rock", "Saltwater", "Freshwater", "Supra-littoral Rock", "Supra-littoral Sediment", "Littoral Rock", "Littoral sediment", "Saltmarsh", "Urban", "Suburban")

lcmclass <- c()

u_val = unique(lcm[[1]])
i = 1
for (v in u_val){
  lcmclass <- append(lcmclass, class_names[v])
}


classdf <- data.frame(classvalue1 = c(0:8), classnames1 = lcmclass)

Plot the LCM reference data

# Print which LCM classes are exist in our area
unique(lcm[[1]])
## [1]  1  3  4  5  6 12 14 20 21
plot(lcm)

Ratify (RAT = “Raster Attribute Table”) the LCM2015. It defines RasterLayer as a categorical variable. Such a RasterLayer is linked to other values via a “Raster Attribute Table” (RAT).

lcm_rat <- lcm[[1]]
lcm_rat <- ratify(lcm_rat)
rat <- levels(lcm_rat)[[1]]
rat$landcover <- lcmclass
levels(lcm_rat) <- rat

Generate sample sites

# Set the random number generator to reproduce the results
set.seed(99)

# Sampling by loading the training sites locations
samp <- sampleStratified(lcm_rat, size = 100, na.rm = TRUE, sp = TRUE)
## Warning in .local(x, size, ...): only 7 cells found for stratum 6
## Warning in .local(x, size, ...): fewer samples than requested found for
## stratum: 6
samp
## class       : SpatialPointsDataFrame 
## features    : 807 
## extent      : 542580, 557580, 5918940, 5933910  (xmin, xmax, ymin, ymax)
## crs         : +proj=utm +zone=30 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 2
## names       :   cell, lcm2015 
## min values  :     40,       1 
## max values  : 250449,      21
table(samp$lcm_rat)
## < table of extent 0 >

Load the LCM color table

We use color scheme provided by LCM in lcm2015gb25m.tif-Band_1.clr file

#cl = read.table('j:\\satellite\\RS_workshop\\LCM2015\\lcm-2015-25m_3034373\\lcm2015gb25m.tif-Band_1.clr', header = FALSE, sep = "", dec = ".")
cl = read.table('LCM2015/lcm-2015-25m_3034373/lcm2015gb25m.tif-Band_1.clr', header = FALSE, sep = "", dec = ".")

# make an empty vector
col_vector <- c()

# load normalized colors to col_vector
for (i in 2:dim(cl)[1]){
  col_vector <- append(col_vector, rgb(cl[i, 2]/255, cl[i, 3]/255, cl[i,4]/255))
}

Show training sites distribution (white crosses)

#detach("package:ggplot2", unload=TRUE)
plt <- levelplot(lcm_rat, col.regions = col_vector[u_val], main = 'Training Sites')
print(plt + layer(sp.points(samp, pch = 3, cex = 0.5, col = "white")))

Extract values for sites

# Extract the layer values for the locations
sampvals <- extract(band_stack, samp, df = TRUE)

# sampvals no longer has the spatial information. To keep the spatial information you use `sp=TRUE` argument in the `extract` function.
# drop the ID column
sampvals <- sampvals[, -1]

# combine the class information with extracted values
sampdata <- data.frame(classvalue = samp@data$lcm2015, sampvals)

Show the CART classification tree

# Train the model
cart <- rpart(as.factor(classvalue)~., data=sampdata, method = 'class', minsplit = 5)

# Plot the trained classification tree
plot(cart, uniform=TRUE, main="Classification Tree")
text(cart, cex = 0.8)

### Perfome Random Forest classification and show the output

# Train the model
rforest <- randomForest(as.factor(classvalue)~., data=sampdata, method = 'class', minsplit = 5)

# Plot the trained classification tree
plot(rforest)

#text(rforest, cex = 0.8)

CART Predict the subset data based on the model

pr <- predict(band_stack, cart, type='class')
pr <- ratify(pr)

Random Forest predict the subset data based on the model

pr_forest <- predict(band_stack, rforest, type='class')
pr_forest <- ratify(pr_forest)

CART: Take only class names which are exist in the classification results

rat1 <- levels(pr)[[1]]
class11 <- c()

i = 1
for (v in rat1$ID){
  class11 <- append(class11, class_names[v])
}
class11
## [1] "Broadleaved Woodland" "Improved Grassland"   "Inland Rock"         
## [4] "Freshwater"           "Urban"                "Suburban"

Random Forest: Take only class names which are exist in the classification results (the same as above)

rat1_forest <- levels(pr_forest)[[1]]
class11_forest <- c()

i = 1
for (v in rat1_forest$ID){
  class11_forest <- append(class11_forest, class_names[v])
}
class11_forest
## [1] "Broadleaved Woodland"    "Arable and Horticulture"
## [3] "Improved Grassland"      "Neutral Grassland"      
## [5] "Calcareous Grassland"    "Inland Rock"            
## [7] "Freshwater"              "Urban"                  
## [9] "Suburban"

Here we see that Random Forest has found more land cover classes than CART.

Plot the results

#CART
rat1$legend <- class11
levels(pr) <- rat1

levelplot(pr, maxpixels = 1e6,
col.regions = col_vector[rat1$ID],
scales=list(draw=FALSE),
main = "CART classification of Landsat 8")

#Random Forest
rat1_forest$legend <- class11_forest
levels(pr_forest) <- rat1_forest

levelplot(pr_forest, maxpixels = 1e6,
col.regions = col_vector[rat1_forest$ID],
scales=list(draw=FALSE),
main = "Random Forest classification of Landsat 8")

Here we can see that two classes which were found by Random Forest (‘Calcareous Grassland’ and ‘Neutral Grassland’) are very small and can be neglected. The class ‘Arable and Horticulture’ is located in the top left corner of the Training Sites image and it has been found by Random Forest around the same area. CART has claffified some urban ares as ‘Inland Rock’ whereas Random Forest correctly grouped these pixels as ‘Urban’.

Results evaluation

sampdata11 <- sampdata[sampdata$classvalue==rat1$ID,]
## Warning in sampdata$classvalue == rat1$ID: longer object length is not a
## multiple of shorter object length
set.seed(99)
j <- kfold(sampdata11, k = 5, by=sampdata11$classvalue)
table(j)
## j
##  1  2  3  4  5 
## 18 23 19 23 18
sampdata11[sampdata11$classvalue==6,]
## [1] classvalue ultra.blue blue       green      red        NIR       
## [7] SWIR1      SWIR2     
## <0 rows> (or 0-length row.names)
x <- list()
for (k in 1:5) {
  train <- sampdata11[j!= k, ]
  test <- sampdata11[j == k, ]
  cart <- rpart(as.factor(classvalue)~., data=train, method = 'class', minsplit = 5)
  pclass <- predict(cart, test, type='class')
  # create a data.frame using the reference and prediction
  x[[k]] <- cbind(test$classvalue, as.integer(pclass))
}
y <- do.call(rbind, x)
y <- data.frame(y)
colnames(y) <- c('observed', 'predicted')

conmat <- table(y)

# change the name of the classes
colnames(conmat) <- class11
rownames(conmat) <- class11

conmat
##                       predicted
## observed               Broadleaved Woodland Improved Grassland Inland Rock
##   Broadleaved Woodland                    9                  2           0
##   Improved Grassland                      3                  9           1
##   Inland Rock                             2                  0          13
##   Freshwater                              2                  1           2
##   Urban                                   2                  1           1
##   Suburban                                1                  0           1
##                       predicted
## observed               Freshwater Urban Suburban
##   Broadleaved Woodland          5     1        0
##   Improved Grassland            2     0        1
##   Inland Rock                   1     0        1
##   Freshwater                   10     1        1
##   Urban                         1     8        4
##   Suburban                      2     3       10

Compute the overall accuracy and the “Kappa” statistic.

Overall accuracy:

# number of cases
n <- sum(conmat)
n
## [1] 101
## [1] 1600
# number of correctly classified cases per class
diag <- diag(conmat)
# Overall Accuracy
OA <- sum(diag) / n
OA
## [1] 0.5841584

Kappa:

# observed (true) cases per class
rowsums <- apply(conmat, 1, sum)
p <- rowsums / n
# predicted cases per class
colsums <- apply(conmat, 2, sum)
q <- colsums / n
expAccuracy <- sum(p*q)
kappa <- (OA - expAccuracy) / (1 - expAccuracy)
kappa
## [1] 0.500765

Producer and user accuracy

# Producer accuracy
PA <- diag / colsums
# User accuracy
UA <- diag / rowsums
outAcc <- data.frame(producerAccuracy = PA, userAccuracy = UA)
outAcc
##                      producerAccuracy userAccuracy
## Broadleaved Woodland        0.4736842    0.5294118
## Improved Grassland          0.6923077    0.5625000
## Inland Rock                 0.7222222    0.7647059
## Freshwater                  0.4761905    0.5882353
## Urban                       0.6153846    0.4705882
## Suburban                    0.5882353    0.5882353

References

  1. Fränti, P., Sieranoja, S. How much can k-means be improved by using better initialization and repeats? Pattern Recognition, Volume 93, September 2019, Pages 95-112
  2. Kaufman, L., Rousseeuw, P. J. 2005. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc
  3. Ho T.K. The Random Subspace Method for Constructing Decision Forests. 1998. IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844.
  4. Lawrence, R. L. Rule-based classification systems using classification and regression tree (CART) analysis. Photogrammetric Engineering & Remote Sensing. Vol. 67, No. 10, 2001, pp. 1137-1142.
  5. Tou, J. T., Gonzalez, R. C. 1974. Pattern Recognition Principles, Addison-Wesley Publishing Company, Reading, Massachusetts.